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Abstract — We describe several algorithms for matrix comple- 
tion and matrix approximation when only some of its entries 
are known. The approximation constraint can be any whose 
approximated solution is known for the full matrix. For low 
rank approximations, similar algorithms appears recently in the 
literature under different names. In this work, we introduce 
new theorems for matrix approximation and show that these 
algorithms can be extended to handle different constraints such 
as nuclear norm, spectral norm, orthogonality constraints and 
more that are different than low rank approximations. As the 
algorithms can be viewed from an optimization point of view, 
we discuss their convergence to global solution for the convex 
case. We also discuss the optimal step size and show that 
it is fixed in each iteration. In addition, the derived matrix 
completion flow is robust and does not require any parameters. 
This matrix completion flow is applicable to different spectral 
minimizations and can be applied to physics, mathematics and 
electrical engineering problems such as data reconstruction of 
images and data coming from PDEs such as Helmholtz's equation 
used for electromagnetic waves. 

I. Introduction 

Matrix completion and matrix approximation are important 
problems in a variety of fields such as statistics ||Tj, biology 
|2|, statistical machine learning |T|, signal processing and 
computer vision/image processing |4J. Rank reduction by ma- 
trix approximation is important, for example, in compression 
where low rank indicates the existence of redundant informa- 
tion and matrix completion is important in collaborative filter- 
ing, such as the Netflix problem and different reconstruction 
problems. Usually, the matrix completion problem, is defined 
as finding a matrix, with smallest possible rank, that satisfy 
the existence of certain entries. 



minimize rank (X) 

subject to Xi j = Mi j, £ ri. 



(1.1) 



Since Eq. |1.1| is an NP-hard problem, some relaxations meth- 
ods have been proposed. The most popular relaxation is one 
that replaces the rank by the nuclear norm: 



minimize ||X||* 
subject to Xi,j 



(1.2) 



where ||X||* denotes the nuclear norm of X that is equal to 
the sum of the singular values of X. A small value of ||X||* is 
related to the property of having a low rank |j5]. An iterative 
solution, which is based on a singular value thresholding. 



is given in f6l. A completion algorithm, based on the local 
information of the matrix, is proposed in [7|. In this work, 
a more robust and simple approach for solving a variety of 
matrix approximation of certain entries by approximating the 
full matrix is discussed. We approximate problems of the form 



minimize HT'qX — PsiMjli? 
subject to /(X) < 0, 



given that the solution for 



minimize 1 1 X — M 1 1 i? 
subject to /(X) < 



a.3) 



(1.4) 



is known. Here, {Vn^Jij — j if G and 

otherwise. If /(X) is convex and satisfies some condition 
(which is explained in the next sections), the algorithm finds 
the global solution. Nevertheless, convergence is guaranteed, 
but to a local solution. Then, we show how this algorithm can 
be used for solving a variety of matrix completion problems 
as well, such as spectral norm completion: 



minimize ||X||2 

subject to Xij — Mi J, € fi, 



Ky-Fan norm completion: 
minimize 11 X| 



(fc) 



subject to Xi,j = Mi J, {hi) G f^, 



(1.5) 



(1.6) 



where ||X||(/j) = X]i=i '^i (sum of largest k singular values). 
Note that the spectral norm and the nuclear norm are a special 
case of the Ky-Fan norm. We also discuss approximation 
problems such as: 



minimize \\Vn^-VnM.\\F 
subject to X'^X = I. 

II. Theorems on full matrix approximation 



(1.7) 



The algorithm that approximates a matrix at certain points 
requires from us to be able to approximate the matrix when 
taking into account all its entries. Therefore, we review some 
theorems on full matrix approximation theorems in addition 
to the well known Eckart- Young theorem mentioned in the 
introduction. The low rank approximation problem can be 
modified to approximate a matrix under the Frobenius norm 



while having the Frobenius norm as a constraint as well instead 
of having low rank. Formally, 



minimize 1 1 X - 
subject to ||X| 



-M||^ 
F < A. 



A solution for Eq. 



n.i 



is given by X 
Proof: The expression ||X 



M 

IIMII, 



(II. 1) 



min(||M||F, A). 



|<A2 



can be thought of as 
an m X n dimensional ball with radius A centered at the origin. 
M is an m X n dimensional point. We are looking for a point X 
on the ball ||X|||, ~ that has a minimal Euclidean distance 
(Frobenius norm) from M. If ||M||f < A then X M and 
it is inside the ball having a distance of zero. If ||M||f- > A, 
then the shortest distance is given by the line going from the 
origin to M whose intersection with the sphere ||X|||n < 
is the closest point to M. This point is given by X = p^^A. 

■ 

An alternative approach uses the Lagrange multiplier in 
a brute-force manner. This leads to a non-linear system of 
equations, which are difficult to solve. Note that this problem 
can be easily extended to the general case 



minimize WVI^-VMWf 
subject to ||X||i? < A. 



(11.2) 



Proof: The proof is similar to the previous one but here 
we are looking for a point X on the sphere that is the closest 
to a line whose points X' e H satisfy PX' = T'M. By geo- 
metrical considerations, this point is given by X = n^^n ^ A. 



Hence, we showed a closed form solution for the problem in 
Eq.[lL2l 

Another example is the solution to the problem: 



minimize ||X — M||i? 
subject to X^X = I. 



(11.3) 



This is known as the orthogonal Procrustes problem ( |8|) 
and the solution is given by X = UV*, where the SVD of 
M is given by M = USV*. The solution can be extended to 
a matrix X satisfying X^X = D^, where D is a known or 
unknown diagonal matrix. When D is unknown, the solution 
is the best possible orthogonal matrix. When D is known, the 
prob lem can be converted to become the orthonormal case (Eq. 
njjl by substituting X = VD where V'^V = I. When D is 
unknown, the problem can be solved by applying an iterative 
algorithm that is described in Q. 

We now examine the following problem: 



minimize jjX — M||i7 
subject to ||X||2 < A. 



(11.4) 



A solution to this problem uses the Pinching theorem ( |10|): 

Lemma II.l (Pinching theorem). For every matrix A and a 
unitary matrix U and for any norm satisfying ||UAU*|| = 
||A|| then ||d/flg(X)|| < ||X||. 

A proof is given in |12|. An alternative proof is given in 



Lemma II.2 (Minimization of the Frobenius norm under the 
spectral norm constraint). Assume the SVD of M is given by 
M — USV* where S = diag{ai, ..,an)- Then, the matrix 
X, which minimizes ||X — M||i? such that ||X||2 < A, is given 
by a. ~ USV* where cji are the singular values of S and 
di = min((Ti, A), i = 1, . . . /c, k < n. 



Proof II X 
||U*XV- E||i.. 
||diag(U*XV) - 
II. 1 we know 
Therefore, 
minimizer 



-Mil;. = I 
Since S 
S||f < ||U*XV- 
that ||diag(U*XV; 



X-USV*||i. 

is diagonal, 
- S||f- From Lemma 
II2 < ||U*XV||2. 



U*XV has to be diagonal and the best 
under the spectral norm constraint is 
achieved by minimizing each element separately yielding 
U*XV = diag(min((Ti, A)), i = l,...k,k < n. Hence, 

x = usv*. ■ 

The same argument that states that U*XV has to be 
diagonal, can also be applied when the constraint is given by 
the nuclear norm. Define S = U*XV. We wish to minimize 

lis - sii^^ = i<f^ - <y^f s.t. ||x||, = ||s||, = k~.| < 

A,i = l,...k,k < n. Note that (7^ has to be nonnegative 
otherwise it will increase the Frobenius norm but will not 
change the nuclear norm. Hence, the problem can now be 
formulated as: 

minimize J2i i'^i " ^^0^ 
subject to J2i '^i — A, (11.5) 
> 0. 

This is a standard convex optimization problem that can be 
solved by methods such as semidefinite programming pTj . 
The exact same can be done to the Ky-Fan norm. 

III. Approximation of certain entries 

Suppose we wish to approximate only certain entries of the 
matrix, under different constraints, i.e. we are interested in 



solving Eq. 1.3 given that the solution of Eq. 1.4 is known and 
given by PM, where V is the solution operator. For example, 
if the constraint is rank(X) < k 2?X is the truncated SVD 
of X containing the first k singular values. Note that V is 
not necessarily convex. We examine the following iterative 
algorithm: 



X„+i =2?(X„-7'(X„-M)). 



(III.l) 



Eq. III.l can be considered as a projected gradient algorithm 
with unit step size, where the projection is given by D. 

Theorem III.l (Local Convergence). .• Let e(X„) = \\VX.n — 
VM-Wp be the error at the nth iteration, then e(X„) is 
monotonically decreasing, and because it is bounded the 
algorithm converges. 



The proof for Theorem III.l is given in 1 14 1. Theorem III.l 



does not say anything about convergence to the global solution. 
However, when the projection V is convex and self adjoint 
(T> — T)*) and the algorithm is modified to have adaptive step 
size, that is: 



X„+i = 2?(X„ - M„7'(X„ - M)), 



(111.2) 



and /i„ is computed by Armijo rule in a greedy form, mini- 
mizing the error in every iteration: 



l[n] = argmin^.g2>o : /(X„j) 
< /(X„) - atrace(V/(X„)^(X„ 
Z„., =2?(X„-^2-JV/(X")) 



(III.3) 



where f{X) = ^\\VX-VM\\l,, > and cr e (0,1), Then 
the algorithm is guarantee to achieve the global solution | |T3| . 
This approach has two major problems: 

• For the cases of interest, the operators for truncating the 
nuclear and spectral norm, are not self-adjoint {V ^ T)*) 

• This approach requires applying the Armijo rule in every 
iteration. This means several applications of the operator 
T) in each iteration which is usually computationally 
expensive. 

As for the first point, requiring the projection T) to be 
self-adjoint can be slightly more than needed for the global 
convergence proof in | |T3| . This requirement is needed in order 
to satisfy {X - F, VX - X) > for F = VY, which always 
holds when T) ~T)* , but also when T) is as we defined in 
Lemma |II.2| and Eq. |II.5[ 



Theorem III.2. Let T) be the following projection (defined 
as in Lemma IL2): Given the SVD of X is X = USV*, we 
define T>\X = USV* where Si =min{si, X). Then, for every 
matrices X and Y such that Y = "DY, (X — Y, 2?X — X) > 

Proof The condition (X - Y,PX - X) > can be 
reformulated as 



(X, X - VX) > (Y, X - VX) 



(111.4) 



where ||y||2 < A. 

First, note that the value of the right hand side is maximal 
when Y and X — VX have the same angle (Cauchy-Schwartz 
inequality). Hence, we define: X = USj^V*, Y = USyV* 
and VX = US^V*. The tilde is for indicating that the 
singular values of S are smaller or equal to A. 

We start by evaluating the left side of Eq |III.4 



(X,X-2?X) =trace[S;f(Sx-Sx)] =^s,,(s,, -S,J. 

(III.5) 

Now, for < A we get (sx- — Sx^) = 0. Hence, only when 
Sxi > A the sum grows and the expression can be rewritten 
as: (X, X - VX) = J2s^. >\ i^^, ~ %i) 
We now observe the right side of Eq. |III.4[ 

(Y, X - PX) = trace[Sy(Sx - S^)] = J2 ~^y'^-'-^ " ^-•)- 

(III.6) 

Again, the elements that contribute to the sum are those for 
which Sxi > A. Hence, on the right side we obtained: (Y, X — 

Both expressions can be thought of as a sum of the positive 
elements (sj.. — Sj..) with different coefficients. Both series 
have the same length (sx^ > A) but the coefficient on the left 



side is Sx^ for z's that give Sj.. > A and the right hand series 
coefficients are by definition (since ||Y||2 < A) smaller than 
A. Therefore, the sum of the left side is bigger than the sum 
of the right side. This completes the proof. ■ 
This means that for the spectral norm, the algorithm con- 
verges to the global solution. The exact same proof can be 
done for the nuclear norm and Ky-Fan norm as well, showing 
the algorithm converges to global solution. 

Theorem III.3 (Optimal step size). For the matrix approxi- 
mation problem (Eq. \L3\ with convex V, the optimal step size 
is given by fin = 1- 



The proof of Theorem III.3 is given in p4) . Note that this 



holds for any case of projected gradient involving orthogonal 



axes. Theorem III.3 states that in our case, when having a 



convex constraint and projection, then Eq. III.l converges 
to the global solution. This means, that now we can solve 
a variety of matrix approximation problem with reasonable 
computation rate. Note, that we have shown that in some cases, 
global solution is achieved even when the projection is not 
self-adjoint (orthogonal). The next section shows, how this 
very simple algorithm, can be applied to matrix completion 
problems as well. 

IV. Matrix Completion 

Matrix completion is an important problem that has been 
investigated extensively. The matrix completion problem dif- 
fers from the matrix approximation problem by the fact that 
the known entries must remain fixed while changing their role 
from the objective function to be minimized to the constraint 
part. A well investigated matrix completion problem appears 
in the introduction as the rank minimization problem. Because 
rank minimization is not convex and NP-hard, it is usually 
relaxed for the nuclear norm minimization. Since for the 



convex case, we have seen that Eq. III.l converges to the 



global solution, matrix completion can be achieved simply 
by using binary search. The advantage of this approach over 
other different approaches, which minimize the nuclear norm 
for example, is that it is general and can be applied to other 
problems that were not addressed such as minimizing the 
spectral norm. Moreover, some algorithms such as the Singular 
Value Thresholding (SVT) |6| require additional parameters r 
and S that affect the convergence and the final result, where 
in this approach no external parameters are required (except 
for tolerance for determining convergence). 



This approach is detailed in Algorithm IV. 1 which is robust 



and does not require any tuning, other than tolerance threshold 



for determining convergence. Algorithm IV. 1 can be used for 



a matrix completion under a variety of constraints. 
Fig. 



IV shows Algorithm IV. 1 results over a corrupted 



image. In the corrupted image, squares of size 3x3 were 
randomly removed from the image, destroying 18% of it. The 
reconstruction is more difficult, since the damage is in squares 
and not just irregular points. The original image nuclear norm 
is 51, 625, the corrupted nuclear norm is 96, 500 and the norm 
of the completed matrix is 50, 418. Minimizing nuclear norm 



Algorithm IV.l: Matrix Completion using Nuclear Norm 
/ Spectral Norm Minimization 

Input: M - matrix to complete, V - projection operator 

that specifies the important entries, 

tol - admissible approximation error, Xtoi - admissible 

constraint accuracy 

Output: X - Completed matrix 

1: M^T'M 

3: Xmax ^ (or |1M||2 for the spectral norm) 

4: 

5: repeat 

6: Xprev ^ ^ 

8: X ^ Approximate VM. s.t. ||X||, < A (or ||X||2 < A 

for the spectral norm case) 
9: error ^\\Vyi-VM\\F 
10: if error > tol then 

11: Amin A 

12: else 

13: Amax A 

14: end if 

15: until error < tol and |A — Xprev\ < Xtoi 
16: return X 
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Fig. rV. 1. Singular values comparison between the different images. 

for image reconstructing is a well known method, as images 
usually have a low numerical rank as the singular values decay 
very fast. It can be seen in Fig. |IV] that the singular values of 
the reconstructed image, are almost identical to the original. 
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Fig. IV.2. Corrupted dog image and the reconstructed image. 
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